skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Le, Phi"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Pep-TCRNet is a novel approach to constructing a prediction model that can evaluate the probability of recognition between a TCR and a peptide amino acid sequence while combining inputs such as TCR sequences, HLA types, and VJ genes.Pep-TCRNet operates in two key steps:Feature Engineering: This step processes different types of variables:TCR and peptide amino acid sequencing data: The model incorporates neural network architectures inspired by language representation models and graph representation model to learn the meaningful embeddings.Categorical data: Specialized encoding techniques are used to ensure optimal feature representation for HLA types and VJ genes.Prediction Model: The second step involves training a prediction model to evaluate the likelihood of a TCR recognizing a specific peptide, based on the features generated in the first step. 
    more » « less
  2. Exploring features associated with the clinical outcome of interest is a rapidly advancing area of research. However, with contemporary sequencing technologies capable of identifying over thousands of genes per sample, there is a challenge in constructing efficient prediction models that balance accuracy and resource utilization. To address this challenge, researchers have developed feature selection methods to enhance performance, reduce overfitting, and ensure resource efficiency. However, applying feature selection models to survival analysis, particularly in clinical datasets characterized by substantial censoring and limited sample sizes, introduces unique challenges. We propose a robust ensemble feature selection approach integrated with group Lasso to identify compelling features and evaluate its performance in predicting survival outcomes. Our approach consistently outperforms established models across various criteria through extensive simulations, demonstrating low false discovery rates, high sensitivity, and high stability. Furthermore, we applied the approach to a colorectal cancer dataset from The Cancer Genome Atlas, showcasing its effectiveness by generating a composite score based on the selected genes to correctly distinguish different subtypes of the patients. In summary, our proposed approach excels in selecting impactful features from high-dimensional data, yielding better outcomes compared to contemporary state-of-the-art models. 
    more » « less
  3. We present gOTzilla, a protocol for interactive zero-knowledge proofs for very large disjunctive statements of the following format: given publicly known circuit C, and set of values Y = {y1 , . . . , yn }, prove knowledge of a witness x such that C(x) = y1 ∨ C(x) = y2 ∨ · · · ∨ C(x) = yn . These type of statements are extremely important for the proof of assets (PoA) problem in cryptocurrencies where a prover wants to prove the knowledge of a secret key sk that associates with the hash of a public key H(pk) posted on the ledger. We note that the size of n in popular cryptocurrencies, such as Bitcoin, is estimated to 80 million. For the construction of gOTzilla, we start by observing that if we restructure the proof statement to an equivalent of proving knowledge of (x, y) such that (C(x) = y) ∧ (y = y1 ∨ · · · ∨ y = yn )), then we can reduce the disjunction of equalities to 1-out-of-N oblivious transfer (OT). Our overall protocol is based on the MPC in the head (MPCitH) paradigm. We additionally provide a concrete, efficient extension of our protocol for the case where C combines algebraic and non-algebraic statements (which is the case in the PoA application). We achieve an asymptotic communication cost of O(log n) plus the proof size of the underlying MPCitH protocol. While related work has similar asymptotic complexity, our approach results in concrete performance improvements. We implement our protocol and provide benchmarks. Concretely, for a set of size 1 million entries, the total run-time of our protocol is 14.89 seconds using 48 threads, with 6.18 MB total communication, which is about 4x faster compared to the state of the art when considering a disjunctive statement with algebraic and non-algebraic elements. 
    more » « less